Improving the Transient Times for Distributed Stochastic Gradient Methods

نویسندگان

چکیده

We consider the distributed optimization problem where $n$ agents, each possessing a local cost function, collaboratively minimize average of functions over connected network. Assuming stochastic gradient information is available, we study algorithm, called exact diffusion with adaptive stepsizes (EDAS) adapted from Exact Diffusion method [1] and NIDS [2] perform non-asymptotic convergence analysis. not only show that EDAS asymptotically achieves same network independent rate as centralized descent (SGD) for minimizing strongly convex smooth objective functions, but also characterize transient time needed algorithm to approach asymptotic rate, which behaves notation="LaTeX">$K_{T}=\mathcal {O}(\frac{n}{1-\lambda _{2}})$, notation="LaTeX">$1-\lambda _{2}$ stands spectral gap mixing matrix. To best our knowledge, shortest when function smooth. Numerical simulations further corroborate strengthen obtained theoretical results.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Stochastic Gradient MCMC

Probabilistic inference on a big data scale is becoming increasingly relevant to both the machine learning and statistics communities. Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. We argue that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw mini-batches from their local pool ...

متن کامل

Variance Reduction for Distributed Stochastic Gradient Descent

Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require either high memory usage or an exact gradient computation (using the entire dataset) at the end of each epoch. This limits the use of VR methods in practical dis...

متن کامل

Without-Replacement Sampling for Stochastic Gradient Methods

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In contrast, sampling without replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling...

متن کامل

Semi-Stochastic Gradient Descent Methods

In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an ε-ac...

متن کامل

Towards Stochastic Conjugate Gradient Methods

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore a number of ways to adopt ideas from conjugate gradient in the stochastic setting, using fast Hessian-vector products to obtain curvature information cheaply. I...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Automatic Control

سال: 2022

ISSN: ['0018-9286', '1558-2523', '2334-3303']

DOI: https://doi.org/10.1109/tac.2022.3201141